Incident Reports (???)
This directory contains a record of all the operational incidents that have occurred within our infrastructure. Proper documentation of each incident helps the team in the following ways:
- Understanding Root Causes: Detailed reports allow us to understand what caused the incident.
- Pattern Recognition: Over time, we might see patterns that hint at deeper infrastructure or code issues.
- Improved Response: By understanding past incidents, we can respond to new ones more effectively.
- Knowledge Transfer: New team members can refer to these reports to understand past issues.
TODO
The incident reports seem somewhat redundant now that a newer system is in place. While I understand the desire to archive these older records here, they can still be accessed using the repo tag Pre-Docusaurus. Please see the box below as a reminder for readers. Perhaps we should include a simple page in the root of this folder that notes the existence of these older reports.
Older incident report
Older incident reports that are not available in current tracking systems are still accessible using the repo tag Pre-Docusaurus.
How to Document an Incident
- Create a New File: For each new incident, create a new markdown file with a descriptive name. Format:
YYYYMMDD_description.md - Follow the Template: Use the template provided in template_incident.md as a reference.
- Link in This Readme: Add a link to the new incident report in the list below.
Incident List
- Template Incident (YYYY-MM-DD)
- Beacon Fluctuation (2025-03-20)
- RPC Data Inconsistency (2025-03-07)
- Price Deviation (2024-12-18)
- Excessive Updates (2024-12-13)
- Manta Low Balance Alert RPC Failure (2024-08-21)
- Fantom RPC issue (2024-08-19)
- Black Swan Event of August 5th 2024(2024-08-05)
- Opsgenie Failure(2024-04-30)
- Centurion SignedDataWorker Issue(2024-04-06)
- Unwanted walletsWorker: Wallet balance (Stage) alert on Polygon testnet (2024-04-06)
- ETH/USD Out of Deviation on Blast Sepolia Testnet (Stage) (2024-04-02)
- Multiple Shadow Alerts Deviation on Kava (2024-03-18)
- BRL-USD on Eth-Sepholia (Testnet) Heartbeat Skip (2024-03-15)
- Persisting Stuck Collector Alerts on Mantle and Blast (2024-03-15)
- Stuck Collector and Unable to Get Block Alerts on Mantle (2024-03-15)
- No Deviation Alerts on Blast (2024-03-14)
- Unable to retrieve logs in the order Payable on rsk and bsc (2024-03-12)
- Unable to Get Balance on Multiple Chains (2024-03-09)
- Dead Gateway Alerts (2024-01-24)
- Missing Value Alerts (2024-01-22)
- Linea Sequencer Update (2024-01-22)
- Linea Public RPC Issue (2024-01-15)
- Stuck Collector Mantle (2024-01-12)
- Mantle Event Collector Lags (2024-01-03)
- Kava Dead Public RPC (2024-01-02)
- Moonbeam Moonriver and Mantle Outage (2023-12-26)
- Stuck Event Collector on Mantle Staging Environment (2023-12-21)
- Self-Funded Feed Heartbeat Exceeded (2023-12-16)
- Arbitrum Outage (2023-12-15)
- Datafeed ID and DAPI Name Mismatch Alerts (2023-12-09)
- Unnecessary price updates (2023-12-04)
- Low and critical balance alerts (2023-12-02)
- Self-funded Feed is not updating (2023-12-01)
- Missing Nodary Value Alerts on All Chains (2023-11-28)
- Multiple Unable to Get Balance Alerts on Linea (2023-11-17)
- Low Balance for a Depository Contract on Arbitrum (2023-11-14)
- Transaction Retrieval Issues (2023-11-06)
- Persisting Genuine Stuck Collector Alert (2023-09-30)
- Low Balance Detected for Top-up Wallet (2023-09-27)
- No Off-chain Prices in Sentinels Dashboard for Beacon Sets (2023-09-26)
- Cached Grouped DAPIs Worker Failed to Run Correctly (2023-09-21)
(Add more incidents as they occur, newest first)